Spotify CS Forecast Exercise

Carlos Amiel

2021-07-04

The Analysis

Using the data set provided and outlined below, we’d like you to build a forecast for customer service email volume at the daily level until December 31, 2021. The purpose of the forecast will be used to determine our daily customer service staffing needs based on the email volume of the day.

The Data

Data Overview

Based on information, here is what we know about our data: The data set contains simulated data for Spotify customer service email volume, subscriptions, and MAUs, but the patterns within are similar to real data. Email, subs, and MAU data are available from January 1, 2019 to May 31, 2021. Forecast subs and MAU were estimated in March 2021. You do not have to use every variable listed for your final model.

Data Import

First we’ll import the data as an excel file

We’ll double check our date formats and change as needed.

Next we’ll briefly summarize our data below. We can see that we have 1,096 total records with 12 variables; one date variable and the rest numeric.

Data summary
Name CS_Forecast_Exercise_Data
Number of rows 1096
Number of columns 12
_______________________
Column type frequency:
Date 1
numeric 11
________________________
Group variables None

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2019-01-01 2021-12-31 2020-07-01 1096

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
email 214 0.80 9079 2299.08 4691 7661 8814.5 9940.5 29920 ▇▃▁▁▁
subs 214 0.80 50948359 7276628.25 39516647 44673674 50993731.0 56826297.0 63146937 ▇▇▇▇▇
subs_standard 214 0.80 22501460 2311921.40 17823463 20564774 22252926.5 24491684.5 26965404 ▂▅▇▇▂
subs_student 214 0.80 3669271 169318.01 3367301 3527783 3663944.5 3840557.5 3915350 ▃▆▃▃▇
subs_family 214 0.80 20738998 4015456.70 14237991 16827842 21159042.5 24282939.2 26770934 ▇▅▅▆▇
f_subs 821 0.25 65675808 2453856.74 61771410 63530832 65669332.0 67783773.0 70300852 ▇▆▆▇▃
f_subs_standard 821 0.25 27037476 596568.18 26311010 26566314 26675257.0 27502558.5 28722821 ▇▁▅▁▁
f_subs_student 821 0.25 3418401 23278.42 3384047 3394838 3418636.0 3443110.0 3453084 ▇▅▃▅▇
f_subs_family 821 0.25 27713511 874456.22 26288684 26917562 27674145.0 28473306.5 29306612 ▇▆▆▆▆
mau 214 0.80 70419337 7816536.85 58437292 62546144 70435576.5 76536461.8 83248627 ▇▃▅▆▆
f_mau 821 0.25 87005037 2433158.16 82956358 85119070 86601582.0 88799819.5 92609723 ▅▇▆▅▂

Time Series Visuals

We’ll use this information to visually inspect date, email, subs, subs_standard, subs_student, subs_family as a collection of time series. What we can see immediately is that subs and sub_family are both fairly linear trends without visually strong seasonal patterns. On the other hand subs_standard exhibits both a linear trend and what looks like monthly seasonality. Meanwhile subs_student shows a steadily increasing volume peaking in early 2020 and steadily declining thereafter.

Next we we’ll want to focus on our outcome variable email to more closely inspect the time series for visual cues that may be helpful in forecasting such as seasonality.

We can see from the initial plot there are various spikes in volume that are over about 20k. Inspecting the dates there doesn’t seem to be a common date occurrence for these spikes. There doesn’t seem to be a strong overall trend either.

Lets plot out our time series but this time annotate apparent anomalies. Inspecting the dates of apparent anomalies, we can confirm no recurring date pattern, letting us know we may benefit from cleaning up these anomalies.

If we now clean up our email time series we can re-inspect the new data visually. We can still confirm a lack of strong trend, but in order to explore seasonality, we’ll need to plot our data in a different manner.

In our next plot we’ll explore seasonal diagnostics more closely. Weekly seasonality does seem apparent with Saturday and Sunday representing down days, and the work week a relatively steady plateau of volume.

Forecast

Next we’ll set up a forecast framework where various models will be trained and evaluated.

First well split our data into training and testing portions. This will allow us to evaluate our models.

Next we’ll create multiple models to evaluate

Now we’ll add these to a model structure that allows us to more easily handle model evaluation

# Modeltime Table
# A tibble: 5 x 3
  .model_id .model   .model_desc                             
      <int> <list>   <chr>                                   
1         1 <fit[+]> ARIMA(4,0,0)(0,1,2)[7]                  
2         2 <fit[+]> ARIMA(1,0,3)(0,1,1)[7] W/ XGBOOST ERRORS
3         3 <fit[+]> ETS(M,N,M)                              
4         4 <fit[+]> PROPHET                                 
5         5 <fit[+]> LM                                      

Now we’ll evaluate the models on test data and add this data to our model table structure.

# Modeltime Table
# A tibble: 5 x 5
  .model_id .model   .model_desc                         .type .calibration_data
      <int> <list>   <chr>                               <chr> <list>           
1         1 <fit[+]> ARIMA(4,0,0)(0,1,2)[7]              Test  <tibble [177 × 4…
2         2 <fit[+]> ARIMA(1,0,3)(0,1,1)[7] W/ XGBOOST … Test  <tibble [177 × 4…
3         3 <fit[+]> ETS(M,N,M)                          Test  <tibble [177 × 4…
4         4 <fit[+]> PROPHET                             Test  <tibble [177 × 4…
5         5 <fit[+]> LM                                  Test  <tibble [177 × 4…

Now we can use this framework to visually inspect our models.

All models have wide confidence intervals and don’t seem to capture the decreasing trend beginning approximately April 2021. Both ARIMA models capture the weekly patterns well and deviate the least in the period after April 2021.

Now we should also take a look at our error metrics in tabular form. The boosted ARIMA model has the lowest mean absolute error and highest r squared.

Accuracy Table
.model_id .model_desc .type mae mape mase smape rmse rsq
1 ARIMA(4,0,0)(0,1,2)[7] Test 1355.06 13.54 1.09 14.19 1808.05 0.23
2 ARIMA(1,0,3)(0,1,1)[7] W/ XGBOOST ERRORS Test 1258.45 12.71 1.01 13.08 1705.33 0.26
3 ETS(M,N,M) Test 1958.16 21.89 1.57 18.99 2328.88 0.24
4 PROPHET Test 1489.10 16.40 1.20 15.09 1984.55 0.07
5 LM Test 1633.22 15.70 1.31 16.90 2135.07 0.02

Now we’ll refit these models to the entire data set and forecast forward 7 months to the end of 2021. All models except for our linear model provide reasonable step ahead forecasts for daily email volume to the end of 2021.

Conclusion / Next Steps

  • All models except for our linear models capture the weekly pattern of email volume well and provide reasonable forecasts
  • In testing each model the ARIMA & ARIMA Boost models performed the best
  • Choose a model considering both accuracy, interpret-ability, and computational cost
  • Develop further studies(s) based on model(s) chosen
  • Develop computational workflow based on model(s) chosen

Epilogue

As an example, we’ll assume that a consensus has been made to explore the regression model further. To do this the model is isolated and further developed by adding features such as nominal day of week and holiday. From this we can view our model coefficients for a preliminary confirmation of seasonal effects, as well as those of holidays.

We can also check to see if the model has improved with the added features. Using our previous model as comparison we can see that our rmse has improved from a previous value of approximately 2130 down to about 1750. This improvement is along the lines of the best time series model in terms of rmse and r2 measures. We now have two viable choices for forecasting, an ARIMA variant as well as a more parsimonious linear model.

.metric .estimate
rmse 1752.1464753
rsq 0.3553691